Design of an Agent Based Context Driven Focused Crawler
نویسندگان
چکیده
A focused crawler downloads web pages that are relevant to a user specified topic. Most of the existing focused crawlers are keyword driven and do not take into account the context associated with the keywords. This leads to retrieval of a large number of web pages irrespective of the fact whether they are logically related. Thus, the keyword based strategy alone is not sufficient for the design of a focused crawler as context relevance is more important as far as the user’s requirement is concerned. This paper proposes the design of a context driven focused crawler (CDFC) that searches and downloads only highly related web pages, thereby reducing the network traffic. It also employs a category tree which is a flexible user interface showing the broad categories of the topics on the web. Since CDFC downloads only the relevant and credible documents, a very small number in comparison, the proposed design significantly reduces the storage space at the search engine side. Index Terms Search engine, Crawler, Hypertext Document System, Category Tree, Software Agents
منابع مشابه
Prioritize the ordering of URL queue in Focused crawler
The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...
متن کاملOntology Driven Focused Crawling of Web Documents
In recent year dynamism of the World Wide Web , the issue of discovering relevant web pages has become an important challenge. Focused crawler aims at selectively seeking out pages that are relevant to a pre-defined set of topics. Most of the current approaches perform syntactic matching, that is, they retrieve documents that contain particular keywords from the user’s query. This often leads t...
متن کاملFocused Web Search Agents: A Critical Study
A software agent evidently decreases the work of a user and automates the process of information retrieval. With a wide variety of agents available, it is imperative to understand their capability in order to deduce their limitations and identify issues for the development of a new context focused web search crawler.
متن کاملA Novel Framework for Context Based Distributed Focused Crawler (CBDFC)
Focused crawling aims to search only the relevant subset of the WWW for a specific topic of user interest; leading to the necessity to decide about the relevancy of a document to the topic of interest; especially when the user is not perfect in specifying the exact context of the topic. This paper provides a novel framework of a context based distributed focused crawler that maintains an index ...
متن کاملSelf Ranking and Evaluation Approach for Focused Crawler Based on Multi-Agent System
The need of better way of retrieving information and dealing with the increasing complexity and volume of information for users is an important research theme. Retrieving information from the www via search engine may be deliberate as the most significant one. Most of the recent efforts that had been done in this area suggest a better solution for general-purpose search engine limitations. That...
متن کامل